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Abstract 

A  multi-input  Muller  C-element  has  frequently  been  used  for  joining  signal  transitions  or 
completion  time  detection  in  self-timed  circuits.  This  paper  presents  an  n-input  Muller  C- 
element  design  which  uses  the  multi-level  login  design  technique  and  has  a  symmetric  format 
for  any  integer  n  >  2.  In  comparison  with  series-parallel  MOS  structure  implementations 
and  C-element  tree  implementations,  our  design  has  fewer  restrictions  in  terms  of  n.  less  path 
delay,  less  delay  variance  from  inputs  to  output,  and  less  area  consumption.  Experimental 
validation  in  this  paper  is  based  on  an  industrial  standau-d  cell  librairy. 

1  Introduction 

A  Muller  C-element  [5]  is  used  as  a  basic  component  in  the  design  of  speed- independent 
circuits.  A  C-element  is  functionadly  equivalent  to  an  SR  latch.  Under  the  assumption  of 
unbounded  gate  delays  it  is  not  possible  to  guaurantee  that  S  and  R  will  not  be  1  simul¬ 
taneously.  This  problem  does  not  arise  with  C-element  [4].  The  output  of  a  two  input 
C-element  will  equad  the  value  of  the  inputs  after  both  inputs  have  reaiched  the  same  value; 
otherwise  the  output  remauns  unchanged.  That  is,  if  t'l  and  Z2  ase  the  two  inputs  and  O  is 
the  output,  then  the  defining  equation  of  the  C-element  is  0  =  ti  •  ij  -t-  O  •  -i-  0  •  [5].  A 

two  input  C-element  can  be  viewed  ais  a  logical  and  of  two  events,  where  an  evf  nt  can  be  a 
0-1  or  1-0  transition  [7].  This  behaviour  is  shown  in  Figure  1. 

A  C-element  is  conrunonly  used  for  joining  signad  transitions  to  signad  the  completion  of 
an  operation  [1,  2,  3,  4,  6].  For  example,  the  16  output  computational  block  in  Figure  2  will 
require  a  16  input  C-element  to  join  adl  16  completion  signads  in  order  to  generate  one  signal 
transition  to  indicate  the  completion  of  the  block. 

The  output  of  an  n-input  C-element  is  1  if  adi  the  inputs  are  1  and  it  is  0  if  adl  the  inputs 
are  0;  otherwise  its  value  remains  unchanged  [6].  The  state  diagrams  for  two-input  and 
three-input  C-elements  are  shown  in  Figure  3,  where  the  state  is  labeled  “inputs/output”. 
The  initiad  state  of  a  C-element  is  having  all  inputs  and  its  output  zero.  This  is  denoted  by 
the  double  circle  in  the  state  diagram. 
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Figure  1:  Timing  diagrams  for  2-input  Muller  C-elements 


(a)  A  typical  seif-timed  computational  block 


(b)  Timing  diagram  of  the  compietionAeset  signals 
Figure  2:  Join  of  completion/reset  signals 
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(a)  Two— input  C-element  (b)  Three— input  C-element 

Figure  3:  State  diagrams  for  Muller  C-elements 


(a)  Series-parallel  MOS  structure  (b)  C-tree  structure 

Figure  4:  Three-input  series-parallel  C-element  designs 
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C-elements  with  large  numbers  of  inputs  are  very  useful  and  an  efficient  implementation 
in  terms  of  area  and  speed  is  needed.  Two  designs  of  multi-input  C-elements  have  been  used; 
a  series-parallel  MOS  structure  [6]  shown  in  Figure  4(a)  and  the  C-element  tree  [4,  6]  shown 
in  Figure  4(b).  Because  the  C-element  is  associative,  the  tree  implementation  uses  (n  —  1) 
two-input  C-elements  to  form  an  n-input  C-element.  The  input-output  delay  of  the  series- 
parallel  MOS  structure  is  less  than  that  of  the  tree  structure.  However,  the  series-parallel 
implementation  is  not  feasible  for  large  n.  For  this  reason,  most  existing  designs  employ  the 
tree  structure.  The  main  disadvantages  of  the  tree  implementation  axe  that  it  is  very  slow 
and  the  variance  in  the  delays  over  different  input-output  paths  is  very  large. 

In  this  paper  we  present  an  efficient  design  of  a  multi-input  C-element.  Our  design  is 
symmetric  and  the  variance  in  delay  over  different  input-output  paths  is  very  small.  In 
Section  2  we  derive  a  symmetric  form  of  an  n-input  C-element  and  provide  estimates  of 
delay  and  area.  In  Section  3  we  demonstrate  the  advantages  of  our  design  by  presenting 
experimental  results  for  C-elements  with  inputs  ranging  from  2  to  128. 


2  Design  of  a  Multi- input  C-element 

The  target  technology  of  our  design  is  CMOS.  Since  inverted  logic  is  faster  than  non-inverted 
logic  in  CMOS,  we  will  use  and-or-invert  (AOI)  logic  and  inverters  instead  of  and-or  logic. 
The  defining  equation  of  an  n-input  C-element  with  a  reset  is  given  by 

OUT  =  ((/i  •  /2  •...•/„)  -f  (/i  +  /2  -b  ...  +  /n)  •  OUT)  •  RESET  ( 1 ) 


where  /,-,  for  i  =  l,...,n,  are  inputs  of  the  C-element,  and  OUT  is  the  output  of  the 
C-element.  Using  DeMorgan’s  law  we  transform  Equation  (1)  as  follows: 

OUT  =  ((/i«/2* 

. . .  •  /n )  +  ( /l  +  /2  +  .  .  .  +  /n)  •  OUT)  •  RESET 

(2) 

=  ((A  • 

■ . .  •  /n)  +  (/l  +  /2  +  .  . .  +  /n)  •  OUT)  •  RESET 

=  ((/l*/2. 

...•In)  +  iIi  +  l2-¥.--  +  ln)»OUT)  +  RESET 

= 

.  .  •  In)  •  {I\  +  /2  d"  •  •  •  4"  In)  •  OUT  -f-  RES^jT 

=  (/l  •  /2  •  . . .  •  /n)  •  ((/l  +  /2  +  . . .  +  In)  +  OUT)  +  RESET 

Equation  (3)  can  be  further  decomposed  to  following  equations. 

(3) 

NANDJTREE  = 

(/l  •  /2  •  .  .  .  •  In) 

(4) 

NOR.TREE  = 

(/l  +  /2  +  .  •  .  +  In) 

(5) 

OUT  = 

NANDJTREE  •  {NORJTREE  +  OUT)  +  RESET 

(6) 

o  o  o  o  o 


Figure  5:  C-elements  design 


(a)  Two-level  NANO.TREE  implementation 


(b)  Three-level  NAND.TREE  implementation 

Figure  6:  Multi-level  NAND.TREE  implementation 
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The  above  decomposition  is  very  useful  when  it  is  mapped  to  a  CMOS  cell-based  imple¬ 
mentation.  In  Figure  5  we  show  a  C-element  design  consisting  of  3  parts:  a  NAND.TREE,  a 
NOR_TREE,  and  a  OAO_PART  {or-and-or),  each  implemented  separately.  The  OAO-PART 
shown  in  Figure  5  remains  the  same  for  all  values  of  n. 

Therefore,  we  only  need  to  find  a  proper  NAND_TREE/NOR_TREE  implementation.  In 
the  HP  C34000  library  [8],  there  are  n-input  NAND  and  NOR  gates  for  2  <  n  <  8.  For 
larger  n,  NAND.TREE  (NOR_TREE)  can  be  further  decomposed  into  two-level  NAND-OR 
tree  (NOR-AND  tree).  When  a  two-level  structure  is  not  sufficient,  we  can  use  more  levels, 
i.e.,  use  a  nand-nor-nand  three  level  structure  to  implement  a  NAND.TREE  or  a  nor-nand- 
nor  structure  to  implement  a  NOR.TREE.  Figures  6  (a)  and  (b)  show  the  two-level  and 
the  three- level  implementations  for  NAND.TREE.  In  order  to  minimize  the  variance  of  the 
input-output  delays,  the  structure  of  the  NAND.TREE  implementation  is  identical  to  the 
structure  of  the  NOR  TREE  implementation. 

We  now  provide  estimates  of  delay  and  area  for  our  design  and  compare  them  with  the 
tree  implementation  of  a  C-element.  A  comparison  with  a  series- parallel  MOS  structure  is 
unnecessary  since  such  an  implementation  is  not  feasible  for  large  numbers  of  inputs. 

Delay:  For  aji  n-input  C-element,  the  input-output  delay  of  a  C-element  tree  implemen¬ 

tation  is  equal  to  the  number  of  levels  in  the  structure  multiplied  by  the  input-output  delay 
of  a  2-input  C-element,  where  the  number  of  levels  is  f/oflfan],  i.e., 

Dtreei^)  =  \log2n\  *  (  2-input  Muller  C-element  delay  )  (7) 

«  \l0g2n]  *  (  2-input  NAND/NOR  delay  -t-  OAOI  delay  )  (8) 

55S  \log2ri\  *  OAOI  delay  +  \l0g2n]  *  2-input  NAND/NOR  delay 

The  input-output  of  our  design  is 

Dmuitiin)  =  OAOI  delay  -f  n-input  NAND.TREE/NOR.TREE  delay  (9) 

The  n-input  NAND.TREE/NOR.TREE  can  be  implemented  by  \log2n\  stages  of  a  NAND2- 
N0R2  tree,  and  can  be  made  faster  by  using  \logjnn\  stages  of  NANDm-NORm  tree.  There¬ 
fore, 

Dmxutiip)  <  OAOI  delay  +  f/ogfjn]  *  2-input  NAND/NOR  delay  (10) 

Obviously,  our  design  is  much  faster  than  C-element  tree  implementation  for  n  >  2,  although 
both  Dtree{n)  and  Dmuiuin)  are  0{log{n)). 

Delay  Variance:  In  order  to  have  less  delay  variance  among  the  input-output  paths  in 
our  design,  two  sufficient  conditions  need  to  be  met. 

1.  Transistors  in  the  OAOI  element  of  jthe  OAOJPART  must  be  sized  so  that  the  delay 
from  one  OR  gate  input  to  the  output  and  the  delay  from  one  AND  gate  input  to  the 
output  is  the  same. 
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2.  the  structure  of  NAND.TREE  should  be  symmetric  and  identical  to  the  structure  of 
the  NOR_TREE. 

These  two  conditions  are  easily  satisfied  for  cell-based  designs,  and  they  are  a  property  of 
the  HP  standard  cell  library.  However,  the  tree  implementation  of  an  n-input  C-element  is 
balanced  only  if  n  =  For  this  reason,  the  variation  in  delays  among  the  different 

input-output  paths  is  very  large. 

Area:  In  comparing  our  design  with  the  C-element  tree  implementation,  the  routing  area 

is  not  considered.  Simileir  to  the  delay  comparison  above,  the  C-element  tree  has  a  big 
overhead  in  terms  of  size  due  to  the  repeated  use  of  the  OAO_PART  in  every  two-input 
C-element  in  the  tree.  An  estimate  of  the  area  of  a  C-element  tree  implementation  is 

Airee(w)  »  {n—l)*{ANAND2  +  ANORl  +  AoAOI-‘rAi!^v)  (11) 

«  (n  —  1)  *  {Aoaoi  +  Aisv)  +  (n  —  1)  *  Asand2  -b  (n  —  1)  *  Anor2 

The  estimate  of  area  for  our  design  is 

Aw:i(n)  «  Aoaoi  +  Amv  +  >ln-input  NAND.TREE  +  ^  n-input  NOR.TREE  (^2) 

The  n-input  NAND_TREE/NOR_TREE  can  be  implemented  by  (n  —  1)  number  of  NAND2- 
N0R2  elements,  and  it  may  have  a  similar  sized  implementation  by  using  stages  of 

NANDm-NORm  tree.  Therefore, 

Amuiti{n)  «  Aoaoi  +  Aisv  +  (n  —  1)  ♦  Af4AND2  +  (n  —  1)  ♦  Anor2 

Obviously,  our  design  is  much  smaller  than  C-element  tree  implementation  for  n  >  2,  al¬ 
though  both  Atreei'n)  and  Am^^^ti{n)  are  0{n). 

3  Experimental  results 

We  present  experimental  results  using  the  HP  C34000  standard  cell  library  [8].  The  input- 
output  path  delay  is  obtained  by  simulating  a  Verilog^^  *  model  distributed  with  this  cell 
library.  Wiring  capacitances  are  included  in  the  model.  For  eaeh  n-input  C-element,  both 
the  rise  and  fall  delays  for  all  input-output  paths  are  simulated,  i.e.,  2n  path  delays  are 
collected.  The  path  delay  of  an  n-input  C-element  is  taken  to  be  the  average  of  these  2n 
delays.  The  delay  variance  among  input-output  paths  is  calculated  by  the  difference  between 
the  maximum  input-output  path  delay  and  the  minimum  input-output  path  delay  of  all 
input-output  paths.  The  area  of  a  design  is  computed  as  the  sum  of  the  area  of  all  cells  in 
the  design.  The  results  are  shown  in  the  Table  1.  In  this  table,  the  name  of  Muller  C-element 
sufiSxed  with  “A”  is  the  series-parallel  structure  implementation,  the  name  suffixed  with  “B” 
is  our  design,  and  the  name  suffixed  with  “AT”  is  the  C-element  tree  implementation  using 
“mullerC2A”,  and  the  name  suffixed  with  “BT”  is  the  C-element  tree  implementation  using 
“mullerC2B”. 

‘Verilog  is  a  hardware  description  language,  and  it  is  a  trademark  of  Cadence  Design  Systems,  Inc. 


7 


mullerC2A 

2.8 

mullerC2B 

2.9 

mullerC3B 

3.0 

mullerC4B 

3.0 

mullerC5B 

3.0 

muilerC6B 

3.9 

mullerC7B 

3.9 

mullerCSB 

3.9 

mullerC9B 

3.9 

mullerC16B 

3.9 

muilerC32B 

4.2 

mullerC64B 

4.4 

mullerC128B 

5.3 

min  (nsec) 


mullerC2AT 


mullerC3AT 


mullerC4AT 


mullerCSAT 


mullerC6AT 


mullerCTAT 


mullerCSAT 


mullerCQAT 


mullerCl6AT 


mullerC32AT 


mullerC64AT 


mullerCl28AT 


mullerC2BT 


mullerC3BT 


mullerC4BT 


mullerCSBT 


mullerCGBT 


mullerCTBT 


mullerCSBT 


mullerCQBT 


mullerCl6BT 


mullerC32BT 


mullerC64BT 


mullerCl2SBT 


max  (nsec) 


3.6 


.2 


3.1 


3.3 


3.5 


4.5 


4.8 


.6 


4.4 


.4 


.8 


.9 


6.0 


avg  (nsec)  max 


3.2 


min  (nsec) 


0.8 


11.6 

9.0 

11.9 

10.3 

11.8 

11.1 

12.0 

11.7 

15.8 

12.6 

16.5 

16.1 

21.1 

20.4 

25.6 

24.7 

30.1 

29.1 

area 


3319 


4 


4596 


5107 


5874 


6384 


6894 


7661 


6384 


16343 


32175 


61287 


122063 


3319 


6638 


9957 


13276 


16595 


19914 


23233 


26552 


49785 


102889 


209097 


421513 


4085 


8170 


12255 


16340 


20425 


24510 


28595 


32680 


61275 


122635 


257355 


518795 
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Figure  7  shows  the  path  delays  in  terms  of  log2n.  As  expected,  the  path  delay  of  the 
C-element  tree  is  approximately  equal  to  \log2n\  *  K,  where  K  is  the  delay  of  a  2-input 
C-element  including  wiring  delay.  Our  design  is  much  faster  than  the  C-element  tree  im¬ 
plementation,  and  path  djelay  grows  very  slowly  as  n  increases.  For  example,  our  32-input 
C-element  design  is  3.89  to  4.43  times  faster  than  C-element  tree  design.  Figure  8  shows  the 
delay  variance  among  input-output  paths  in  terms  of  n.  The  delay  variance  of  our  design  is 
small  for  any  number  of  inputs,  but  the  delay  variance  of  the  tree  implementation  is  only 
small  for  a  balanced  tree,  i.e.,  where  n  =  Finally,  the  area  comparison  between  our 

design  and  the  C-element  tree  implementation  is  shown  in  Figure  9.  In  both  our  design  and 
C-tree  implementation,  the  cell  area  grows  linearly  with  n.  However,  the  increasing  area 
in  our  design  is  due  to  its  NAND_TREE  and  NOR_TREE  whereas  the  increasing  area  for 
the  C-element  tree  implementation  is  due  to  the  number  of  2-input  C-elements.  Therefore, 
the  rate  of  increase  of  area  for  our  design  is  much  smaller  than  for  the  tree  implementa¬ 
tion.  For  example,  our  32-input  design  is  3.20  to  3.81  times  smaller  than  the  C-element  tree 
implementation. 


Number  of  inpou  00(2  n) 


Figure  7:  Comparison  of  path  delay 


Cell  area  (mioofi  iquaie)  Delay  variance  (nsec) 


Figure  8:  Comparison  of  delay  variance 
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4  Conclusions 


We  have  presented  a  new  design  for  an  n-input  C-element.  which  is  faster,  has  less  delay 
variance  among  input-output  paths,  and  is  smaller  than  the  C-element  tree  implementation. 
We  have  also  demonstrated  the  advantages  of  our  design  using  an  industrial  standard  cell 
library. 
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